Goto

Collaborating Authors

 rotatable bond


994545b2308bbbbc97e3e687ea9e464f-Supplemental-Conference.pdf

Neural Information Processing Systems

In particular, torsional diffusion does not address the longstanding difficulty that existing cheminformatics methods have with macrocycles--rings with 12 or more atoms that have found several applications in drug discovery [Driggers et al., 2008].


Flexible MOF Generation with Torsion-Aware Flow Matching

Kim, Nayoung, Kim, Seongsu, Ahn, Sungsoo

arXiv.org Artificial Intelligence

Designing metal-organic frameworks (MOFs) with novel chemistries is a longstanding challenge due to their large combinatorial space and complex 3D arrangements of the building blocks. While recent deep generative models have enabled scalable MOF generation, they assume (1) a fixed set of building blocks and (2) known local 3D coordinates of building blocks. However, this limits their ability to (1) design novel MOFs and (2) generate the structure using novel building blocks. We propose a two-stage MOF generation framework that overcomes these limitations by modeling both chemical and geometric degrees of freedom. First, we train an SMILES-based autoregressive model to generate metal and organic building blocks, paired with a cheminformatics toolkit for 3D structure initialization. Second, we introduce a flow matching model that predicts translations, rotations, and torsional angles to assemble the blocks into valid 3D frameworks. Our experiments demonstrate improved reconstruction accuracy, the generation of valid, novel, and unique MOFs, and the ability to create novel building blocks. Our code is available at https://github.com/nayoung10/MOFFlow-2.


A Definitions Consider a molecular graph G = (V, E) and its space of possible conformers

Neural Information Processing Systems

Similar quantities are defined for atoms with other numbers of neighbors. See Appendix F.3 for additional In general there exist many possible such sets for a given molecular graph. With these preliminaries we now restate the proposition: Proposition 1. The calculation of Eq. 29 proceeds as follows. The conformer matching procedure, summarised in Algorithm 4, proceeds as follows.



Adjoint Sampling: Highly Scalable Diffusion Samplers via Adjoint Matching

Havens, Aaron, Miller, Benjamin Kurt, Yan, Bing, Domingo-Enrich, Carles, Sriram, Anuroop, Wood, Brandon, Levine, Daniel, Hu, Bin, Amos, Brandon, Karrer, Brian, Fu, Xiang, Liu, Guan-Horng, Chen, Ricky T. Q.

arXiv.org Artificial Intelligence

We introduce Adjoint Sampling, a highly scalable and efficient algorithm for learning diffusion processes that sample from unnormalized densities, or energy functions. It is the first on-policy approach that allows significantly more gradient updates than the number of energy evaluations and model samples, allowing us to scale to much larger problem settings than previously explored by similar methods. Our framework is theoretically grounded in stochastic optimal control and shares the same theoretical guarantees as Adjoint Matching, being able to train without the need for corrective measures that push samples towards the target distribution. We show how to incorporate key symmetries, as well as periodic boundary conditions, for modeling molecules in both cartesian and torsional coordinates. We demonstrate the effectiveness of our approach through extensive experiments on classical energy functions, and further scale up to neural network-based energy models where we perform amortized conformer generation across many molecular systems. To encourage further research in developing highly scalable sampling methods, we plan to open source these challenging benchmarks, where successful methods can directly impact progress in computational chemistry.


One-step Structure Prediction and Screening for Protein-Ligand Complexes using Multi-Task Geometric Deep Learning

He, Kelei, Dong, Tiejun, Wu, Jinhui, Zhang, Junfeng

arXiv.org Artificial Intelligence

Understanding the structure of the protein-ligand complex is crucial to drug development. Existing virtual structure measurement and screening methods are dominated by docking and its derived methods combined with deep learning. However, the sampling and scoring methodology have largely restricted the accuracy and efficiency. Here, we show that these two fundamental tasks can be accurately tackled with a single model, namely LigPose, based on multi-task geometric deep learning. By representing the ligand and the protein pair as a graph, LigPose directly optimizes the three-dimensional structure of the complex, with the learning of binding strength and atomic interactions as auxiliary tasks, enabling its one-step prediction ability without docking tools. Extensive experiments show LigPose achieved state-of-the-art performance on major tasks in drug research. Its considerable improvements indicate a promising paradigm of AI-based pipeline for drug development.


Pre-training with Fractional Denoising to Enhance Molecular Property Prediction

Ni, Yuyan, Feng, Shikun, Hong, Xin, Sun, Yuancheng, Ma, Wei-Ying, Ma, Zhi-Ming, Ye, Qiwei, Lan, Yanyan

arXiv.org Artificial Intelligence

Deep learning methods have been considered promising for accelerating molecular screening in drug discovery and material design. Due to the limited availability of labelled data, various self-supervised molecular pre-training methods have been presented. While many existing methods utilize common pre-training tasks in computer vision (CV) and natural language processing (NLP), they often overlook the fundamental physical principles governing molecules. In contrast, applying denoising in pre-training can be interpreted as an equivalent force learning, but the limited noise distribution introduces bias into the molecular distribution. To address this issue, we introduce a molecular pre-training framework called fractional denoising (Frad), which decouples noise design from the constraints imposed by force learning equivalence. In this way, the noise becomes customizable, allowing for incorporating chemical priors to significantly improve molecular distribution modeling. Experiments demonstrate that our framework consistently outperforms existing methods, establishing state-of-the-art results across force prediction, quantum chemical properties, and binding affinity tasks. The refined noise design enhances force accuracy and sampling coverage, which contribute to the creation of physically consistent molecular representations, ultimately leading to superior predictive performance.


Fractional Denoising for 3D Molecular Pre-training

Feng, Shikun, Ni, Yuyan, Lan, Yanyan, Ma, Zhi-Ming, Ma, Wei-Ying

arXiv.org Artificial Intelligence

Coordinate denoising is a promising 3D molecular pre-training method, which has achieved remarkable performance in various downstream drug discovery tasks. Theoretically, the objective is equivalent to learning the force field, which is revealed helpful for downstream tasks. Nevertheless, there are two challenges for coordinate denoising to learn an effective force field, i.e. low coverage samples and isotropic force field. The underlying reason is that molecular distributions assumed by existing denoising methods fail to capture the anisotropic characteristic of molecules. To tackle these challenges, we propose a novel hybrid noise strategy, including noises on both dihedral angel and coordinate. However, denoising such hybrid noise in a traditional way is no more equivalent to learning the force field. Through theoretical deductions, we find that the problem is caused by the dependency of the input conformation for covariance. To this end, we propose to decouple the two types of noise and design a novel fractional denoising method (Frad), which only denoises the latter coordinate part. In this way, Frad enjoys both the merits of sampling more low-energy structures and the force field equivalence. Extensive experiments show the effectiveness of Frad in molecular representation, with a new state-of-the-art on 9 out of 12 tasks of QM9 and on 7 out of 8 targets of MD17.


Von Mises Mixture Distributions for Molecular Conformation Generation

Swanson, Kirk, Williams, Jake, Jonas, Eric

arXiv.org Artificial Intelligence

Molecules are frequently represented as graphs, but the underlying 3D molecular geometry (the locations of the atoms) ultimately determines most molecular properties. However, most molecules are not static and at room temperature adopt a wide variety of geometries or $\textit{conformations}$. The resulting distribution on geometries $p(x)$ is known as the Boltzmann distribution, and many molecular properties are expectations computed under this distribution. Generating accurate samples from the Boltzmann distribution is therefore essential for computing these expectations accurately. Traditional sampling-based methods are computationally expensive, and most recent machine learning-based methods have focused on identifying $\textit{modes}$ in this distribution rather than generating true $\textit{samples}$. Generating such samples requires capturing conformational variability, and it has been widely recognized that the majority of conformational variability in molecules arises from rotatable bonds. In this work, we present VonMisesNet, a new graph neural network that captures conformational variability via a variational approximation of rotatable bond torsion angles as a mixture of von Mises distributions. We demonstrate that VonMisesNet can generate conformations for arbitrary molecules in a way that is both physically accurate with respect to the Boltzmann distribution and orders of magnitude faster than existing sampling methods.


Torsional Diffusion for Molecular Conformer Generation

Jing, Bowen, Corso, Gabriele, Chang, Jeffrey, Barzilay, Regina, Jaakkola, Tommi

arXiv.org Artificial Intelligence

Molecular conformer generation is a fundamental task in computational chemistry. Several machine learning approaches have been developed, but none have outperformed state-of-the-art cheminformatics methods. We propose torsional diffusion, a novel diffusion framework that operates on the space of torsion angles via a diffusion process on the hypertorus and an extrinsic-to-intrinsic score model. On a standard benchmark of drug-like molecules, torsional diffusion generates superior conformer ensembles compared to machine learning and cheminformatics methods in terms of both RMSD and chemical properties, and is orders of magnitude faster than previous diffusion-based models. Moreover, our model provides exact likelihoods, which we employ to build the first generalizable Boltzmann generator.